Practical - Week 1
Florencia Grattarola, Friederike Wölke, Gabriel Ortega
(Department of Spatial Sciences)
2023-10-02
Data that can place a particular taxa in a particular location and time can take many forms.
Opportunistic incidence records
| PROS | CONS |
|---|---|
| huge amounts of data available, easily aggregated | often without details of effort/method, wide variation in data quality |
Presence-absence data
| PROS | CONS |
|---|---|
| absences are informative, area and effort are measured | less abundant (too time consuming), methods are species-specific |
Repeated surveys
| PRO | CONS |
|---|---|
| standardised protocols, multiple points in time | expensive: geographically restricted, usually temporally too |
Range-maps
| PROS | CONS |
|---|---|
| rough estimates of the outer boundaries of areas within which species are likely to occur | large spatial and temporal uncertainties |
Data can also be defined as how they were collected.
Structured
Semi-structured
Unstructured (opportunistic)
Finally, data can also be defined as how they are made available for others.
Disaggregated
Aggregated
GBIF is an international network and data infrastructure funded by the world’s governments and aimed at providing anyone, anywhere, open access to data about all types of life on Earth.
OBIS is a global open-access data and information clearing-house on marine biodiversity for science, conservation and sustainable development.
eBird’s goal is to gather birdwatcher’s knowledge and experience in the form of checklists of birds, archive it, and freely share it to power new data-driven approaches to science, conservation and education.
eBird’s goal is to gather birdwatcher’s knowledge and experience in the form of checklists of birds, archive it, and freely share it to power new data-driven approaches to science, conservation and education.
iNaturalist is one of the world’s most popular nature apps. It allows participants to contribute observations of any organism, or traces thereof, along with associated spatio-temporal metadata.
Map of Life endeavors to provide ‘best-possible’ species range information and species lists for any geographic area. The Map of Life assembles and integrates different sources of data describing species distributions worldwide.
IUCN’s (International Union for Conservation of Nature) Red List of Threatened Species has evolved to become the world’s most comprehensive information source on the global extinction risk status of animal, fungus and plant species.
rredlist: https://github.com/ropensci/rredlist
IUCN’s (International Union for Conservation of Nature) Red List of Threatened Species has evolved to become the world’s most comprehensive information source on the global extinction risk status of animal, fungus and plant species.
BIEN is a network of ecologists, botanists, and computer scientists working together to document global patterns of plant diversity, function and distribution.
SiBBr (Brazilian Biodiversity Information System) is an online platform that integrates data and information about biodiversity and ecosystems from different sources, making them accessible for different uses.
sibbr: https://github.com/sibbr
BBS (Breeding Bird Survey) involves thousands of volunteer birdwatchers carrying out standardised annual bird counts on randomly-located 1-km sites. It’s part of the NBN Atlas.
BioTime is an open access database global database of assemblage time series for quantifying and understanding biodiversity change.
BioTime Hub: https://github.com/bioTIMEHub
PREDICTS uses data on local biodiversity around the world to model how human activities affect biological communities. This biodiversity change is shown as the Biodiversity Intactness Index (BII).
Open means anyone can freely access, use, modify, and share for any purpose.
Darwin Core is the internationally agreed data standard to facilitate the sharing of information about biological diversity.
countryCode: The standard code for the country in which the Location occurs. Recommended best practice is to use an ISO 3166-1-alpha-2 country code.
recordedBy: A list (concatenated and separated) of names of people, groups, or organizations responsible for recording the original Occurrence.
Open data are licensed under open licenses. Some examples:
CC0: Public domain
CC-BY: Attribution
CC-BY-NC: Attribution - Non Commercial
CC-BY-SA: Attribution - Share Alike
Data that are standardized and have an open licence can be shared :)
Chose a taxon, chose one data source and try to get distribution data.
Then answer the following 3 questions:
What kind of data types does the source provide?
Which kind of taxa are covered by the database generally?
How accessible is the data? Can anyone download it? Restrictions?
What was your experience? What issues did you encounter while getting the data?
We will use the mammals of Czech Republic as an example dataset. We will access data through GBIF using tools available in R.
File > New project > New directory or Existing directory
tidyverse.
We will be using many functions from this package, like filter(), mutate(), and later read_csv().
We will use rgbif.
So, let’s get the taxon ID for the Mammalia class
And now we can use the function occ_count() to find out the number of occurrence records for the entire Czech Republic.
How many occurrence records are in GBIF for the entire Czech Republic?
And how many records for the mammals of Czech Republic?
We are ready to do a download. Whoop!
To do this, we will use occ_search().
occ_search(
taxonKey = NULL,
scientificName = NULL,
country = NULL,
publishingCountry = NULL,
hasCoordinate = NULL,
typeStatus = NULL,
recordNumber = NULL,
lastInterpreted = NULL,
continent = NULL,
geometry = NULL,
geom_big = "asis",
geom_size = 40,
geom_n = 10,
recordedBy = NULL,
recordedByID = NULL,
identifiedByID = NULL,
basisOfRecord = NULL,
datasetKey = NULL,
eventDate = NULL,
catalogNumber = NULL,
year = NULL,
month = NULL,
decimalLatitude = NULL,
decimalLongitude = NULL,
elevation = NULL,
depth = NULL,
institutionCode = NULL,
collectionCode = NULL,
hasGeospatialIssue = NULL,
issue = NULL,
search = NULL,
mediaType = NULL,
subgenusKey = NULL,
repatriated = NULL,
phylumKey = NULL,
kingdomKey = NULL,
classKey = NULL,
orderKey = NULL,
familyKey = NULL,
genusKey = NULL,
establishmentMeans = NULL,
protocol = NULL,
license = NULL,
organismId = NULL,
publishingOrg = NULL,
stateProvince = NULL,
waterBody = NULL,
locality = NULL,
limit = 500,
start = 0,
fields = "all",
return = NULL,
facet = NULL,
facetMincount = NULL,
facetMultiselect = NULL,
skip_validate = TRUE,
curlopts = list(),
...
)Get occurrences records of mammals from Czech Republic.
Records found [6366]
Records returned [500]
No. unique hierarchies [38]
No. media records [500]
No. facets [0]
Args [occurrenceStatus=PRESENT, limit=500, offset=0, taxonKey=359, country=CZ,
fields=all]
# A tibble: 500 × 98
key scien…¹ decim…² decim…³ issues datas…⁴ publi…⁵ insta…⁶ hosti…⁷ publi…⁸
<chr> <chr> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 40115… Dama d… 49.2 16.5 cdc,c… 50c950… 28eb1a… 997448… 28eb1a… US
2 40116… Castor… 50.2 14.6 cdc,c… 50c950… 28eb1a… 997448… 28eb1a… US
3 40150… Myocas… 49.7 15.1 cdc,c… 50c950… 28eb1a… 997448… 28eb1a… US
4 40181… Myocas… 50.1 14.4 cdc,c… 50c950… 28eb1a… 997448… 28eb1a… US
5 40149… Sus sc… 49.2 16.5 cdc,c… 50c950… 28eb1a… 997448… 28eb1a… US
6 40149… Dama d… 49.2 16.5 cdc,c… 50c950… 28eb1a… 997448… 28eb1a… US
7 40149… Capreo… 49.6 16.7 cdc,c… 50c950… 28eb1a… 997448… 28eb1a… US
8 40149… Lepus … 49.6 16.7 cdc,c… 50c950… 28eb1a… 997448… 28eb1a… US
9 40149… Myocas… 50.1 14.6 cdc,c… 50c950… 28eb1a… 997448… 28eb1a… US
10 40148… Myocas… 49.8 14.7 cdc,c… 50c950… 28eb1a… 997448… 28eb1a… US
# … with 490 more rows, 88 more variables: protocol <chr>, lastCrawled <chr>,
# lastParsed <chr>, crawlId <int>, basisOfRecord <chr>,
# occurrenceStatus <chr>, taxonKey <int>, kingdomKey <int>, phylumKey <int>,
# classKey <int>, orderKey <int>, familyKey <int>, genusKey <int>,
# speciesKey <int>, acceptedTaxonKey <int>, acceptedScientificName <chr>,
# kingdom <chr>, phylum <chr>, order <chr>, family <chr>, genus <chr>,
# species <chr>, genericName <chr>, specificEpithet <chr>, taxonRank <chr>, …
Check the data output. What’s the format? How many rows does it have?
Get all occurrences records of mammals from Czech Republic.
Finally, we store the result in the object mammalsCZ.
mammalsCZ <- occ_search(
taxonKey = taxon_key, # Key 359 created previously
country = country_code, # CZ, ISO code of Czechia
limit = 6000, # Max number of records to download
hasGeospatialIssue = F # Only records without spatial issues
)
mammalsCZ <- mammalsCZ$data # The output of occ_search is a list with a data object inside. Here we pull the data out of the list.Mammals occurrence records from the Czech Republic
Rows: 6,000
Columns: 177
$ key <chr> "4011579235", "4011687250", "401505…
$ scientificName <chr> "Dama dama (Linnaeus, 1758)", "Cast…
$ decimalLatitude <dbl> 49.19989, 50.21619, 49.73967, 50.08…
$ decimalLongitude <dbl> 16.52097, 14.64081, 15.08824, 14.41…
$ issues <chr> "cdc,cdround", "cdc,cdround", "cdc,…
$ datasetKey <chr> "50c9509d-22c7-4a22-a47d-8c48425ef4…
$ publishingOrgKey <chr> "28eb1a3f-1c15-4a95-931a-4af90ecb57…
$ installationKey <chr> "997448a8-f762-11e1-a439-00145eb45e…
$ hostingOrganizationKey <chr> "28eb1a3f-1c15-4a95-931a-4af90ecb57…
$ publishingCountry <chr> "US", "US", "US", "US", "US", "US",…
$ protocol <chr> "DWC_ARCHIVE", "DWC_ARCHIVE", "DWC_…
$ lastCrawled <chr> "2023-09-28T05:00:54.279+00:00", "2…
$ lastParsed <chr> "2023-09-28T12:16:10.557+00:00", "2…
$ crawlId <int> 399, 399, 399, 399, 399, 399, 399, …
$ basisOfRecord <chr> "HUMAN_OBSERVATION", "HUMAN_OBSERVA…
$ occurrenceStatus <chr> "PRESENT", "PRESENT", "PRESENT", "P…
$ taxonKey <int> 5220136, 4409131, 4264680, 4264680,…
$ kingdomKey <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ phylumKey <int> 44, 44, 44, 44, 44, 44, 44, 44, 44,…
$ classKey <int> 359, 359, 359, 359, 359, 359, 359, …
$ orderKey <int> 731, 1459, 1459, 1459, 731, 731, 73…
$ familyKey <int> 5298, 5493, 3240572, 3240572, 5302,…
$ genusKey <int> 8397832, 3240758, 3240573, 3240573,…
$ speciesKey <int> 5220136, 4409131, 4264680, 4264680,…
$ acceptedTaxonKey <int> 5220136, 4409131, 4264680, 4264680,…
$ acceptedScientificName <chr> "Dama dama (Linnaeus, 1758)", "Cast…
$ kingdom <chr> "Animalia", "Animalia", "Animalia",…
$ phylum <chr> "Chordata", "Chordata", "Chordata",…
$ order <chr> "Artiodactyla", "Rodentia", "Rodent…
$ family <chr> "Cervidae", "Castoridae", "Myocasto…
$ genus <chr> "Dama", "Castor", "Myocastor", "Myo…
$ species <chr> "Dama dama", "Castor fiber", "Myoca…
$ genericName <chr> "Dama", "Castor", "Myocastor", "Myo…
$ specificEpithet <chr> "dama", "fiber", "coypus", "coypus"…
$ taxonRank <chr> "SPECIES", "SPECIES", "SPECIES", "S…
$ taxonomicStatus <chr> "ACCEPTED", "ACCEPTED", "ACCEPTED",…
$ iucnRedListCategory <chr> "LC", "LC", "LC", "LC", "LC", "LC",…
$ dateIdentified <chr> "2023-01-01T19:17:07", "2023-01-02T…
$ coordinateUncertaintyInMeters <dbl> 31, 130, 31, 31, 15, 61, 31, 15, 77…
$ continent <chr> "EUROPE", "EUROPE", "EUROPE", "EURO…
$ stateProvince <chr> "Jihomoravský", "Středočeský", "Stř…
$ year <int> 2023, 2023, 2023, 2023, 2023, 2023,…
$ month <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ day <int> 1, 1, 1, 5, 4, 4, 3, 3, 6, 8, 2, 5,…
$ eventDate <chr> "2023-01-01T14:40:23", "2023-01-01T…
$ modified <chr> "2023-01-02T04:44:01.000+00:00", "2…
$ lastInterpreted <chr> "2023-09-28T12:16:10.557+00:00", "2…
$ references <chr> "https://www.inaturalist.org/observ…
$ license <chr> "http://creativecommons.org/license…
$ identifier <chr> "145580826", "145674501", "14584810…
$ facts <chr> "none", "none", "none", "none", "no…
$ relations <chr> "none", "none", "none", "none", "no…
$ isInCluster <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, …
$ datasetName <chr> "iNaturalist research-grade observa…
$ recordedBy <chr> "Marilena Wilding", "slepice_s_fota…
$ identifiedBy <chr> "grigorenko", "Lefebvre Maxence", "…
$ geodeticDatum <chr> "WGS84", "WGS84", "WGS84", "WGS84",…
$ class <chr> "Mammalia", "Mammalia", "Mammalia",…
$ countryCode <chr> "CZ", "CZ", "CZ", "CZ", "CZ", "CZ",…
$ recordedByIDs <chr> "none", "none", "none", "none", "no…
$ identifiedByIDs <chr> "none", "none", "none", "none", "no…
$ country <chr> "Czechia", "Czechia", "Czechia", "C…
$ rightsHolder <chr> "Marilena Wilding", "slepice_s_fota…
$ identifier.1 <chr> "145580826", "145674501", "14584810…
$ http...unknown.org.nick <chr> "marilena_wilding", "slepice_s_fota…
$ verbatimEventDate <chr> "2023-01-01 14:40:23", "2023-01-01 …
$ verbatimLocality <chr> "Stará dálnice, 641 00 Brno-Brno-Že…
$ collectionCode <chr> "Observations", "Observations", "Ob…
$ gbifID <chr> "4011579235", "4011687250", "401505…
$ occurrenceID <chr> "https://www.inaturalist.org/observ…
$ taxonID <chr> "42161", "43793", "43997", "43997",…
$ catalogNumber <chr> "145580826", "145674501", "14584810…
$ institutionCode <chr> "iNaturalist", "iNaturalist", "iNat…
$ eventTime <chr> "14:40:23+01:00", "12:21:24+01:00",…
$ occurrenceRemarks <chr> "Observed in national park Obora Ho…
$ http...unknown.org.captive <chr> "wild", "wild", "wild", "wild", "wi…
$ identificationID <chr> "324197342", "324439694", "32494044…
$ name <chr> "Dama dama (Linnaeus, 1758)", "Cast…
$ recordedByIDs.type <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ recordedByIDs.value <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ informationWithheld <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ lifeStage <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ infraspecificEpithet <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ identifiedByIDs.type <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ identifiedByIDs.value <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ individualCount <int> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ vernacularName <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ locality <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ higherClassification <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ recordNumber <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ dynamicProperties <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ taxonConceptID <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ http...unknown.org.taxonRankID <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ identificationVerificationStatus <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ taxonRemarks <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ distanceFromCentroidInMeters <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ identificationRemarks <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ sex <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ samplingProtocol <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ dataGeneralizations <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ datasetID <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ language <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ accessRights <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ eventID <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ projectId <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ organismQuantity <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ organismQuantityType <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ otherCatalogNumbers <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ gadm <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ associatedSequences <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ networkKeys <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ coordinatePrecision <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ institutionKey <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ acceptedNameUsage <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ locationRemarks <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ collectionKey <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ preparations <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ institutionID <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ nomenclaturalCode <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ georeferencedBy <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ type <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ disposition <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ bibliographicCitation <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ collectionID <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ elevation <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ elevationAccuracy <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ http...unknown.org.language <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fieldNumber <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ verbatimIdentification <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ locationAccordingTo <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ georeferencedDate <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ higherGeography <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ georeferenceProtocol <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ footprintWKT <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ georeferenceVerificationStatus <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ endDayOfYear <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ verbatimCoordinateSystem <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ organismID <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ previousIdentifications <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ identificationQualifier <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ georeferenceSources <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ ownerInstitutionCode <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ footprintSRS <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ georeferenceRemarks <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ locationID <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ http...unknown.org.recordID <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ county <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ rights <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ startDayOfYear <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ http...unknown.org.recordEnteredBy <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ establishmentMeans <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ parentNameUsage <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ island <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ materialSampleID <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ associatedReferences <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ verbatimElevation <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ higherGeographyID <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ eventRemarks <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ combinationAuthors <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ verbatimScientificName <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ namePublishedIn <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ combinationYear <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ http...unknown.org.canonicalName <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ verbatimLabel <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ latestEraOrHighestErathem <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ latestEonOrHighestEonothem <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ latestPeriodOrHighestSystem <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ earliestEonOrLowestEonothem <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ earliestEraOrLowestErathem <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ earliestEpochOrLowestSeries <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ earliestPeriodOrLowestSystem <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ latestEpochOrHighestSeries <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ earliestAgeOrLowestStage <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ namePublishedInYear <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ lithostratigraphicTerms <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ verbatimTaxonRank <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ latestAgeOrHighestStage <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
Mammals occurrence records from the Czech Republic
How many records do we have?
How many species do we have?
distinct() is used to see unique values
Data are not ‘good’ or ‘bad’, the quality will depend on our goal.
Some things we can check:
CoordinateCleaner: https://github.com/ropensci/CoordinateCleaner
Automated flagging of common spatial and temporal errors in data.
As an example, we will check the following fields:
basisOfRecord: we want preserved specimens or observationstaxonRank: we want records at species level.coordinateUncertaintyInMeters: we want them to be smaller than 10km.basisOfRecord: we want preserved specimens or observationsdistinct() is used to see unique values
basisOfRecord: we want preserved specimens or observationsgroup_by() is used to group values within a variable
basisOfRecord: we want preserved specimens or observationsNote the use of | (OR) to filter the data. Another alternative is filter(basisOfRecord %in% c("PRESERVED_SPECIMEN","HUMAN_OBSERVATION")).
taxonRank: we want records at species leveltaxonRank: we want records at species levelcoordinateUncertaintyInMeters: we want them to be smaller than 10kmmammalsCZ %>%
filter(coordinateUncertaintyInMeters >= 10000) %>%
select(scientificName,
coordinateUncertaintyInMeters,
stateProvince)# A tibble: 226 × 3
scientificName coordinateUncertaintyInM…¹ state…²
<chr> <dbl> <chr>
1 Myotis nattereri (Kuhl, 1817) 26454 Středo…
2 Myotis myotis (Borkhausen, 1797) 26454 Středo…
3 Myotis myotis (Borkhausen, 1797) 26454 Středo…
4 Myotis myotis (Borkhausen, 1797) 26454 Středo…
5 Rhinolophus hipposideros (Bechstein, 1800) 26454 Středo…
6 Rhinolophus hipposideros (Bechstein, 1800) 26454 Středo…
7 Myotis myotis (Borkhausen, 1797) 26454 Středo…
8 Barbastella barbastellus (Schreber, 1774) 26454 Středo…
9 Barbastella barbastellus (Schreber, 1774) 26454 Středo…
10 Plecotus auritus (Linnaeus, 1758) 26454 Středo…
# … with 216 more rows, and abbreviated variable names
# ¹coordinateUncertaintyInMeters, ²stateProvince
coordinateUncertaintyInMeters: we want them to be smaller than 10kmWe’ll get to this next week :)
And finally, a simple trick to produce separate maps per order.